
\subsection{Instruction Latency Requirements}
\label{sec:scalar:timing}

Any computation which operates on cryptographic secrets must not
exhibit execution latency which is correlated with the value of the
secret.
Integer multiply and carry-less multiply instructions are principly affected
by this requirement.
Cores which implement the scalar cryptography
extension must interpret the following {\em conditions}\footnote{
Prior versions of this specification described the constant-time indicator
as a {\em hint}. In RISC-V terminology, a hint is a totally optional
indicator to the core, and may always be ignored.
This mechanism is catagorically not a hint in this sense.
It {\em must} be handled by the core, and cannot be ignored.
The word {\em hint} has hence been removed from the specification to
emphasise this point.
} to provide some
guarantees of constant time behaviour to cryptographic code:

\begin{itemize}
\item {\tt rs1 <= rs2} - 
    The instruction {\em must} execute with a latency which is
    independent of the values of its input operands.
\item {\tt rs1  > rs2} - 
    The instruction {\em may} execute with a latency which is
    independent of the values of its input operands, but is not
    guaranteed to do so.
\end{itemize}

This condition is applied to the following instructions:
{\tt mul},
{\tt mulh},
{\tt mulw},
{\tt mulhs},
{\tt mulhsu},
{\tt clmul},
{\tt clmulh}.
This covers all of the integer multiply instructions in the
M extension, and the carry-less multiply instructions required by the
scalar cryptography extension.

\begin{displayquote}{\em
We considered alternative approaches to this.
For example, a CSR containing an ``enable constant time mode'' bit.
While this is a reasonable approach, we opted to avoid adding
state (and so complicate things like context switches and CSR accesses)
or additional instructions.
}\end{displayquote}

Cores which do not implement any data-dependent optimisations in their
multipliers (e.g., early termination, memoization) do not need to add
any additional logic to interpret this indicator.
Note that this indicator must also be interpreted when {\em fusing}
sequences of multiply instructions. For example, the sequence
\begin{verbatim}
mulh t0, a0, a1
mul  t1, a0, a1
\end{verbatim}
{\em may} be fused into a single operation inside the core.
However, because it meets the {\tt rs1 <= rs2} constraint, the fused
operation must still execution with a latency which is not dependent on
the input operands.
When fusing instruction pairs which partially meet the condition:
\begin{verbatim}
mulh t0, a0, a1 // a0 <= a1, latency must be independent of GPR[rs1], GPR[rs2]
mul  t1, a1, a0 // a1 >  a0, not guaranteed to have data-independent latency.
\end{verbatim}
cores do not need to abide by the data independent latency constraint,
as this is consistent with the un-fused sequence.
Note that the register address choices {\em may} result in different
execution latencies, so long as the numerical values contained in the
sourced registers do not affect the execution latency.
For example,
{\tt mul a1, a2, a3} (a1 = a2 * a3)
may have a different latency profile from self-assignment
{\tt mul a2, a2, a3} (a2 *= a3)
or squaring
{\tt mul a1, a2, a2} (a1 = a2 * a2) - 
so long as the {\em contents} of {\tt GPR[a2/a3]} do not affect
the execution latency.
None of these latency requirements apply to floating point operations.

